Hi SJH,
I don't think you are missing anything. Off chip accesses are
horribly slow compared to on-chip activity. I find it ironic that
the DSP can theoretically do like 100+ 32-bit floating point
operations in the time to set a bit in the FPGA.
The EMIF (External Memory Interface) runs at 100MHz not 200MHz.
We set the Asynchronous Write timing for 3 Write Setup cycles, 15
Write Strobe cycles, and 1 Hold Cycles. Which should be 190ns.
And there are always extra penalties for bus turnaround and
whatnot.
I have found that external writes don't cost much if they can be
spread out. Its like sending data through a pipeline. Once it
gets sent the DSP can go on. However if the next thing needs to
be sent it stalls if the pipeline isn't empty. Reads always stall
because the DSP must wait to receive the data.
I doubt if it will help much but you should probably look at the
assembly code to see what is coded by the compiler. Maybe try
optimization level 3 with -O3 and remove the opt_for_space
option? Or is that removing it?
Regards
TK
Hi Tom,
I know this subject has been covered before,
but I was wondering if I'm missing something.
The following loop outputs a 2MHz square wave,
which seems to be rather slow considering it
is doing practically nothing except toggle the
FPGA location. I'm using cl6x with options
-mv6710 -ml3 -mu -O2 --opt_for_space
unsigned
short spi_rw(register unsigned short mosi)
{
unsigned i;
register unsigned short miso;
for (i = 0; i < 16; ++i) {
*sclk_fpgaset = sclk_hi;
*sclk_fpgaclr = sclk_lo;
}
return miso;
}
sclk_fpgaset/clr are macros which are constant
pointers to the appropriate memory mapped
locations, and sclk_lo/hi are constants.
If I actually try to read/write data, then it
slows down to about 800kHz clock rate:
unsigned
short spi_rw(register unsigned short mosi)
{
unsigned i;
register unsigned short miso;
for (i = 0; i < 16; ++i) {
*sclk_fpgaset = sclk_hi;
//if (mosi & 0x8000)
// *mosi_fpgaset = mosi_hi;
//else
// *mosi_fpgaclr = mosi_lo;
// branch free...
*(volatile unsigned char *)(0x91000452 ^
(mosi>>9 & 0x40)) = 0xF7 ^ ((unsigned
char)0 - (unsigned char)(mosi>>15));
mosi <<= 1;
// read...
miso = miso<<1 |
*miso_fpgard>>2 & 1;
*sclk_fpgaclr = sclk_lo;
}
return miso;
}
So I can't work out why any read/write to the fpga
pin is taking 250ns. Are there a lot of wait states
added to fpga access in that 0x90000000 block? The
DSP has a 5ns cycle time, and it can stall up to 6
cycles if the pipeline requires, so the slowest
instruction should be 30ns. So it looks like 200ns
overhead accessing the FPGA?
Note that that horrible branch-free code didn't
make a noticeable difference compared with the more
obvious code, so I don't think it's the code which
is slow.
I'm trying to read/write a 16-bit value every tick
(90us), but it's taking about 20us which is a
significant fraction of the CPU. It would be nice to
be able to cut this down to 10% CPU.
Regards,
SJH